Parallel Algorithms for Solving Markov Decision Process

نویسندگان

  • Qi Zhang
  • Guangzhong Sun
  • Yinlong Xu
چکیده

Markov decision process (MDP) provides the foundations for a number of problems, such as artificial intelligence studying, automated planning and reinforcement learning. MDP can be solved efficiently in theory. However, for large scenarios, more investigations are needed to reveal practical algorithms. Algorithms for solving MDP have a natural concurrency. In this paper, we present parallel algorithms based on dynamic programming. Meanwhile, the cost of computation and communication complexity of this method is analyzed. Moreover, experimental results demonstrate excellent speedups and scalability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Parallel Algorithm for POMDP Solution

Most exact algorithms for solving partially observable Markov decision processes (POMDPs) are based on a form of dynamic programming in which a piecewise-linear and convex representation of the value function is updated at every iteration to more accurately approximate the true value function. However, the process is computationally expensive, thus limiting the practical application of POMDPs i...

متن کامل

Cold standby redundancy optimization for nonrepairable series-parallel systems: Erlang time to failure distribution

In modeling a cold standby redundancy allocation problem (RAP) with imperfect switching mechanism, deriving a closed form version of a system reliability is too difficult. A convenient lower bound on system reliability is proposed and this approximation is widely used as a part of objective function for a system reliability maximization problem in the literature. Considering this lower bound do...

متن کامل

Producing efficient error-bounded solutions for transition independent decentralized mdps

There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates. Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading ...

متن کامل

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon. The idea of this algorithm is based o...

متن کامل

Lids - P - 2172 Asynchronous Stochastic Approximation and Q - Learning 1

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009